Efficient Two-Stage Genome-Wide Association Designs Based on False Positive Report Probabilities
نویسنده
چکیده
Despite recent advances, very-high-throughput (VHT) technologies capable of genotyping hundreds of thousands of SNPs in individual samples remain prohibitively expensive for the large studies necessary to screen substantial sections of the genome for variants with modest effects on disease risk. This paper presents a two-stage strategy, where a portion of available samples are genotyped with VHT technology, and a small number of the most promising variants are genotyped with standard high-throughput techniques in the remaining samples as an independent replication study. The sample sizes in the first and second stages and the corresponding significance levels are chosen to limit False Positive Report Probability (FPRP), while maximizing the number of Expected True Positives (ETPs). (The FPRP is the conditional probability that a marker is not truly associated with disease, given the a significant test for disease-marker association.) For a fixed budget, the two-stage strategy has greater power (a larger number of ETPs) than the single-stage strategy (where all subjects are genotyped using expensive VHT technology). Furthermore, concentrating on the FPRP leads to considerable savings relative to strategies designed to control the family-wise error (e.g. Bonferonni correction). The FPRP and number of ETPs can also accommodate researchers' prior beliefs about the number of causal loci and the magnitude of their effects. The expected number of false positives does not change if the true number and effects of causal loci differs from the specified prior (although the false discovery rate will vary), thus limiting the absolute amount of resources spent chasing "false leads."
منابع مشابه
Programs for calculating the statistical powers of detecting susceptibility genes in case–control studies based on multistage designs
MOTIVATION A two-stage association study is the most commonly used method among multistage designs to efficiently identify disease susceptibility genes. Recently, some SNP studies have utilized more than two stages to detect disease genes. However, there are few available programs for calculating statistical powers and positive predictive values (PPVs) of arbitrary n-stage designs. RESULTS We...
متن کاملComparative analysis of different approaches for dealing with candidate regions in the context of a genome-wide association study
Genome-wide association studies (GWAS) test hundreds of thousands of single-nucleotide polymorphisms (SNPs) for association to a trait, treating each marker equally and ignoring prior evidence of association to specific regions. Typically, promising regions are selected for further investigation based on p-values obtained from simple tests of association. However, loci that exert only a weak, l...
متن کاملOptimal two-stage genome-wide association designs based on false discovery rate
Genome-wide association studies are likely to be conducted in large scale in the near future. In such studies, searching over hundreds of thousands of markers for the few ones that are associated with disease brings out the multiple-hypothesis testing problem in its severe form. We explore, in a two-stage design, how the use of false discovery rate (FDR) can alleviate the burden of a prohibitiv...
متن کاملOptimal designs for two-stage genome-wide association studies.
Genome-wide association (GWA) studies require genotyping hundreds of thousands of markers on thousands of subjects, and are expensive at current genotyping costs. To conserve resources, many GWA studies are adopting a staged design in which a proportion of the available samples are genotyped on all markers in stage 1, and a proportion of these markers are genotyped on the remaining samples in s...
متن کاملIdentifying significant gene‐environment interactions using a combination of screening testing and hierarchical false discovery rate control
Although gene-environment (G× E) interactions play an important role in many biological systems, detecting these interactions within genome-wide data can be challenging due to the loss in statistical power incurred by multiple hypothesis correction. To address the challenge of poor power and the limitations of existing multistage methods, we recently developed a screening-testing approach for G...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
دوره شماره
صفحات -
تاریخ انتشار 2006